Clustering short push-to-talk segments

نویسندگان

  • Ilya Shapiro
  • Neta Rabin
  • Irit Opher
  • Itshak Lapidot
چکیده

We present a method for clustering short push-to-talk speech segments in the presence of different numbers of speakers. Iterative Mean Shift algorithm based on the cosine distance is used to perform speaker clustering on i-vectors generated from many short speech segments. We report results as measured by the Accuracy, the average number of detected speakers (ANDS), the average cluster purity (ACP), the average speaker purity (ASP) and K . We achieve clustering accuracy of: 90.0%, 86.9% and 72.1% for 3, 15 and 60 speakers respectively.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

In Proceedings of ICSLP - 96 JANUS - II : Towards Spontaneous Spanish Speech

JANUS-II is a research system for investigating various issues in speech-to-speech translations and has been implemented for speech-to-speech translations on many languages 1]. In this paper, we address the Spanish speech recognition part of JANUS-II. First, we report the bootstrap and optimization of the recognition system. Then we investigate the diierence between push-to-talk and cross-talk ...

متن کامل

Published in " Proceedings of ICSLP - 96 " JANUS - II : Towards Spontaneous Spanish Speech RecognitionPuming Zhan

JANUS-II is a research system for investigating various issues in speech-to-speech translations and has been implemented for speech-to-speech translations on many languages 1]. In this paper, we address the Spanish speech recognition part of JANUS-II. First, we report the bootstrap and optimization of the recognition system. Then we investigate the diierence between push-to-talk and cross-talk ...

متن کامل

Multimodal Speaker Diarization Utilizing Face Clustering Information

Multimodal clustering/diarization tries to answer the question ”who spoke when” by using audio and visual information. Diarization consists of two steps, at first segmentation of the audio information and detection of the speech segments and then clustering of the speech segments to group the speakers. This task has been mainly studied on audiovisual data from meetings, news broadcasts or talk ...

متن کامل

NTP-PoCT: a conformance test tool for push-to-talk over cellular network

Push-to-talk over Cellular (PoC) provides walkie– talkie like service in the cellular telecommunications network [9]. In this service, several predefined PoC group members participate in one PoC session. Since the PoC session is half-duplex, only one group member speaks at a time, and the others listen. Therefore, a user must ask for the permission to speak by pressing the push-to-talk button. ...

متن کامل

Estimating Speaker Clustering Quality Using Logistic Regression

This paper focuses on estimating clustering validity by using logistic regression. For many applications it might be important to estimate the quality of the clustering, e.g. in case of speech segments’ clustering, make a decision whether to use the clustered data for speaker verification. In the case of short segments speakers clustering, the common criteria for cluster validity are average cl...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015